A survey of historical document image datasets

نویسندگان

چکیده

Abstract This paper presents a systematic literature review of image datasets for document analysis, focusing on historical documents, such as handwritten manuscripts and early prints. Finding appropriate analysis is crucial prerequisite to facilitate research using different machine learning algorithms. However, because the very large variety actual data (e.g., scripts, tasks, dates, support systems, amount deterioration), formats label representation, evaluation processes benchmarks, finding difficult task. work fills this gap, presenting meta-study existing datasets. After selection process (according PRISMA guidelines), we select 65 studies that are chosen based factors, year publication, number methods implemented in article, reliability algorithms, dataset size, journal outlet. We summarize each study by assigning it one three pre-defined tasks: classification, layout structure, or content analysis. present statistics, type, language, input visual aspects, ground truth information every dataset. In addition, provide benchmark tasks results from these papers recent competitions. further discuss gaps challenges domain. advocate providing conversion tools common COCO format computer vision tasks) always set metrics, instead just one, make comparable across studies.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Multispectral Image Restoration of Historical Document Images

Culture is preserved through various documents which is a part of the civilization and heritage. Due to extinction and single document copies available for the future generations about the ancient scripts, the archiving of these documents in the digital process is the solution for these problems. In this paper, the aim is to restore the historical document from tears, stains and poor visibility...

متن کامل

Restoration of Degraded Historical Document Image

Restoration plays a very important role in enhancing the degraded noisy images. To enhance the degraded image, the numerous algorithms have been designed. Since image processing algorithms are subjective, not all algorithms that developed will address all type of degradedness. To address specific type of problem the suitable algorithms need to be selected. In this paper a combination of spatial...

متن کامل

a structural survey of the polish posters

تصویرسازی قابلیتهای فراوانی را دارا است

15 صفحه اول

A Performance Evaluation Methodology for Historical Document Image Binarization

Document image binarization is of great importance in the document image analysis and recognition pipeline since it affects further stages of the recognition process. The evaluation of a binarization method aids in studying its algorithmic behaviour and verifying its effectiveness by providing qualitative and quantitative indication of its performance. This work concerns a pixel-based binarizat...

متن کامل

A Unified Framework for Degraded Thai Historical Document Image Restoration

Binarization method is the key process to restore degraded historical document image. In this paper, the framework for degraded Thai historical document image restoration is proposed. The proposed framework consists of three stage including image filtering stage, local-based thresholding stage, and cluster analysis stage. Image filtering stage aims to eliminate some noises by using Wiener filte...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: International Journal on Document Analysis and Recognition

سال: 2022

ISSN: ['1433-2833', '1433-2825']

DOI: https://doi.org/10.1007/s10032-022-00405-8